PART 1 : Figure 1.1: Summary Ouput - Initial Model

## 
## Call:
## lm(formula = cmRate ~ medianAge + pctPoverty + pctBach + rmRace, 
##     data = cancer_mortality_rates)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -100.963  -13.973    1.864   13.971  112.002 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 181.8395    15.4857  11.742  < 2e-16 ***
## medianAge     0.0216     0.2258   0.096    0.924    
## pctPoverty    0.9730     0.2260   4.306 1.98e-05 ***
## pctBach      -1.7385     0.2404  -7.233 1.66e-12 ***
## rmRacewhite   3.2094     6.7514   0.475    0.635    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.97 on 532 degrees of freedom
## Multiple R-squared:  0.2358, Adjusted R-squared:  0.2301 
## F-statistic: 41.05 on 4 and 532 DF,  p-value: < 2.2e-16

Figure 1.2: Residuals Plot - Initial Model


Figure 1.3: QQ Plot of Residuals - Initial Model


Figure 1.4: Histogram - Initial Model (with a log transformation)


Figure 1.5: Scatterplot Matrix of Data - Initial Model


Figure 1.6: Scatterplot - Initial Model


PART 2 : Figure 2.1 : Summary Ouput - Updated Model 1 (with log transformation)

## 
## Call:
## lm(formula = cmRate ~ medianAge + log(pctPoverty) + log(pctBach) + 
##     rmRace, data = cancer_mortality_rates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -95.644 -13.381   1.875  13.676 112.971 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     197.38383   23.71322   8.324 7.23e-16 ***
## medianAge         0.05195    0.21954   0.237 0.813030    
## log(pctPoverty)  14.76224    3.88662   3.798 0.000163 ***
## log(pctBach)    -24.23113    3.40192  -7.123 3.45e-12 ***
## rmRacewhite      -0.32411    6.44839  -0.050 0.959933    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.88 on 532 degrees of freedom
## Multiple R-squared:  0.2419, Adjusted R-squared:  0.2362 
## F-statistic: 42.43 on 4 and 532 DF,  p-value: < 2.2e-16

Figure 2.2: Residuals plot - Updated Model 1


Figure 2.3: QQ Plot of Residuals - Updated Model1


PART 3 : Figure 3.1: Scatterplot with Color - Initial Model


PART 4 : Figure 4.1 Updated Model 2 (remove rmRace and add region)

## 
## Call:
## lm(formula = cmRate ~ medianAge + pctPoverty + pctBach + region, 
##     data = cancer_mortality_rates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -82.645 -12.598   0.785  13.324 114.984 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     185.681713  12.265156  15.139  < 2e-16 ***
## medianAge         0.009403   0.218698   0.043 0.965723    
## pctPoverty        0.655445   0.216710   3.025 0.002611 ** 
## pctBach          -1.517447   0.236462  -6.417 3.08e-10 ***
## regionNortheast   3.223576   3.634499   0.887 0.375515    
## regionSoutheast  10.575464   2.680767   3.945 9.06e-05 ***
## regionSouthwest  -7.533614   3.753246  -2.007 0.045234 *  
## regionWest      -13.557700   3.605283  -3.761 0.000188 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.83 on 529 degrees of freedom
## Multiple R-squared:  0.3106, Adjusted R-squared:  0.3015 
## F-statistic: 34.06 on 7 and 529 DF,  p-value: < 2.2e-16

Figure 4.2: Residuals Plot - Updated Model2


Figure 4.3: QQ Plot of Residuals - Updated Model2


Figure 4.4: Scatterplot with Color - Updated Model2

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'


Figure 4.5: Interaction - Updated Model2

## 
## Call:
## lm(formula = cmRate ~ medianAge + pctPoverty + pctBach + region, 
##     data = cancer_mortality_rates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -82.645 -12.598   0.785  13.324 114.984 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     185.681713  12.265156  15.139  < 2e-16 ***
## medianAge         0.009403   0.218698   0.043 0.965723    
## pctPoverty        0.655445   0.216710   3.025 0.002611 ** 
## pctBach          -1.517447   0.236462  -6.417 3.08e-10 ***
## regionNortheast   3.223576   3.634499   0.887 0.375515    
## regionSoutheast  10.575464   2.680767   3.945 9.06e-05 ***
## regionSouthwest  -7.533614   3.753246  -2.007 0.045234 *  
## regionWest      -13.557700   3.605283  -3.761 0.000188 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.83 on 529 degrees of freedom
## Multiple R-squared:  0.3106, Adjusted R-squared:  0.3015 
## F-statistic: 34.06 on 7 and 529 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = cmRate ~ medianAge + pctPoverty + pctBach + region + 
##     region * medianAge, data = cancer_mortality_rates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -83.025 -12.426   1.066  13.263 119.994 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               225.4976    16.9377  13.313  < 2e-16 ***
## medianAge                  -0.9096     0.3559  -2.556 0.010869 *  
## pctPoverty                  0.6485     0.2158   3.005 0.002781 ** 
## pctBach                    -1.6070     0.2361  -6.806 2.75e-11 ***
## regionNortheast           -24.9000    37.2851  -0.668 0.504536    
## regionSoutheast           -23.3533    21.4470  -1.089 0.276704    
## regionSouthwest           -84.5602    29.1778  -2.898 0.003911 ** 
## regionWest                -96.7608    23.2354  -4.164 3.65e-05 ***
## medianAge:regionNortheast   0.6767     0.8901   0.760 0.447398    
## medianAge:regionSoutheast   0.8012     0.5158   1.553 0.120971    
## medianAge:regionSouthwest   1.9233     0.7394   2.601 0.009555 ** 
## medianAge:regionWest        2.0077     0.5538   3.625 0.000317 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.58 on 525 degrees of freedom
## Multiple R-squared:  0.3308, Adjusted R-squared:  0.3168 
## F-statistic:  23.6 on 11 and 525 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = cmRate ~ medianAge + pctPoverty + pctBach + region + 
##     region * pctPoverty, data = cancer_mortality_rates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -83.013 -12.717   1.336  12.913 109.817 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                172.35338   13.54165  12.728  < 2e-16 ***
## medianAge                   -0.04439    0.21822  -0.203 0.838886    
## pctPoverty                   1.78297    0.47566   3.748 0.000198 ***
## pctBach                     -1.49394    0.23702  -6.303 6.19e-10 ***
## regionNortheast             18.72030   13.20697   1.417 0.156943    
## regionSoutheast             25.30363    8.19235   3.089 0.002117 ** 
## regionSouthwest             17.14721   13.39397   1.280 0.201034    
## regionWest                  25.50834   11.71214   2.178 0.029855 *  
## pctPoverty:regionNortheast  -1.15373    1.00337  -1.150 0.250726    
## pctPoverty:regionSoutheast  -1.10246    0.51640  -2.135 0.033231 *  
## pctPoverty:regionSouthwest  -1.67496    0.79136  -2.117 0.034767 *  
## pctPoverty:regionWest       -2.66706    0.75429  -3.536 0.000442 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.63 on 525 degrees of freedom
## Multiple R-squared:  0.3278, Adjusted R-squared:  0.3137 
## F-statistic: 23.27 on 11 and 525 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = cmRate ~ medianAge + pctPoverty + pctBach + region + 
##     region * pctBach, data = cancer_mortality_rates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -83.400 -12.271   0.917  13.144 115.606 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             187.52950   13.01787  14.406  < 2e-16 ***
## medianAge                -0.02629    0.21984  -0.120  0.90487    
## pctPoverty                0.57785    0.22148   2.609  0.00934 ** 
## pctBach                  -1.46626    0.37829  -3.876  0.00012 ***
## regionNortheast          -7.09326   12.08534  -0.587  0.55750    
## regionSoutheast          17.19739    6.87732   2.501  0.01270 *  
## regionSouthwest         -14.32425   11.43049  -1.253  0.21070    
## regionWest              -16.29305   11.31211  -1.440  0.15037    
## pctBach:regionNortheast   0.59556    0.71720   0.830  0.40670    
## pctBach:regionSoutheast  -0.54389    0.48873  -1.113  0.26627    
## pctBach:regionSouthwest   0.54405    0.82515   0.659  0.50997    
## pctBach:regionWest        0.16842    0.69081   0.244  0.80748    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.83 on 525 degrees of freedom
## Multiple R-squared:  0.3161, Adjusted R-squared:  0.3017 
## F-statistic: 22.06 on 11 and 525 DF,  p-value: < 2.2e-16

PART 5 (Final Model) : Figure 5.1

lm_final <- lm(cmRate ~ medianAge + log(pctPoverty) + log(pctBach) + region + medianAge * region + log(pctPoverty) * region, data = cancer_mortality_rates)
summary(lm_final)
## 
## Call:
## lm(formula = cmRate ~ medianAge + log(pctPoverty) + log(pctBach) + 
##     region + medianAge * region + log(pctPoverty) * region, data = cancer_mortality_rates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -80.813 -12.483   1.388  12.908 115.309 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     197.0484    27.0751   7.278 1.26e-12 ***
## medianAge                        -0.7709     0.3517  -2.192  0.02883 *  
## log(pctPoverty)                  23.8175     6.4767   3.677  0.00026 ***
## log(pctBach)                    -20.1774     3.4251  -5.891 6.89e-09 ***
## regionNortheast                  19.9793    49.2747   0.405  0.68530    
## regionSoutheast                  10.8963    31.1692   0.350  0.72679    
## regionSouthwest                 -32.4127    46.0680  -0.704  0.48201    
## regionWest                      -29.5334    46.3600  -0.637  0.52438    
## medianAge:regionNortheast         0.5759     0.8891   0.648  0.51748    
## medianAge:regionSoutheast         0.7103     0.5174   1.373  0.17038    
## medianAge:regionSouthwest         1.8181     0.7393   2.459  0.01424 *  
## medianAge:regionWest              1.7963     0.5690   3.157  0.00169 ** 
## log(pctPoverty):regionNortheast -16.1591    12.4488  -1.298  0.19485    
## log(pctPoverty):regionSoutheast -12.1815     7.7235  -1.577  0.11535    
## log(pctPoverty):regionSouthwest -18.0939    12.4148  -1.457  0.14559    
## log(pctPoverty):regionWest      -22.5391    13.0483  -1.727  0.08469 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.55 on 521 degrees of freedom
## Multiple R-squared:  0.3379, Adjusted R-squared:  0.3189 
## F-statistic: 17.73 on 15 and 521 DF,  p-value: < 2.2e-16

Figure 5.2: Residuals Plot - final Model


Figure 5.3: QQ Plot of Residuals - Final Model


Figure 5.4: VIF - Final Model

##       medianAge log(pctPoverty)    log(pctBach) 
##        1.111866        1.701780        1.672570

Final Model - CMF

\(E(Y_{cmRate}|X) = \beta_0 + \beta_1 X_{medianAge} + \beta_2 log(X_{pctPoverty}) + \beta_3 log(X_{pctBach}) + \beta_4 I_{Northeast} + \beta_5 I_{Southeast} + \beta_6 I_{Southwest} + \beta_7 I_{West} + \beta_8 I_{Northeast} * X_{medianAge} + \beta_9 I_{Southeast} * X_{medianAge} + \beta_{10} I_{Southwest} * X_{medianAge} + \beta_{11} I_{West} * X_{medianAge} + \beta_{12} I_{Northeast} * log(X_{pctPoverty}) + \beta_{13} I_{Southeast} * log(X_{pctPoverty}) + \beta_{14} I_{Southwest} * log(X_{pctPoverty}) + \beta_{15} I_{West} * log(X_{pctPoverty})\)